I this notebook we ingest and visualize the mobility trends data provided by Apple, [APPL1].
We take the following steps:
Download the data
Import the data and summarise it
Transform the data into long form
Partition the data into subsets that correspond to combinations of geographical regions and transportation types
Make contingency matrices and corresponding heat-map plots
Make nearest neighbors graphs over the contingency matrices and plot communities
Plot the corresponding time series
About This Data The CSV file and charts on this site show a relative volume of directions requests per country/region or city compared to a baseline volume on January 13th, 2020. We define our day as midnight-to-midnight, Pacific time. Cities represent usage in greater metropolitan areas and are stably defined during this period. In many countries/regions and cities, relative volume has increased since January 13th, consistent with normal, seasonal usage of Apple Maps. Day of week effects are important to normalize as you use this data. Data that is sent from users’ devices to the Maps service is associated with random, rotating identifiers so Apple doesn’t have a profile of your movements and searches. Apple Maps has no demographic information about our users, so we can’t make any statements about the representativeness of our usage against the overall population.
The observations listed in this subsection are also placed under the relevant statistics in the following sections and indicated with “Observation”.
The directions requests volumes reference date for normalization is 2020-01-13 : all the values in that column are \(100\).
From the community clusters of the nearest neighbor graphs (derived from the time series of the normalized driving directions requests volume) we see that countries and cities are clustered in expected ways. For example, in the community graph plot corresponding to “{city, driving}” the cities Oslo, Copenhagen, Helsinki, Stockholm, and Zurich are placed in the same cluster. In the graphs corresponding to “{city, transit}” and “{city, walking}” the Japanese cities Tokyo, Osaka, Nagoya, and Fukuoka are clustered together.
In the time series plots the Sundays are indicated with orange dashed lines. We can see that from Monday to Thursday people are more familiar with their trips than say on Fridays and Saturdays. We can also see that on Sundays people (on average) are more familiar with their trips or simply travel less.
library(Matrix)
Warning messages:
1: In if (charToRaw(x) < 20) paste("\\u", toupper(format(as.hexmode(as.integer(charToRaw(x))), :
the condition has length > 1 and only the first element will be used
2: In if (charToRaw(x) < 20) paste("\\u", toupper(format(as.hexmode(as.integer(charToRaw(x))), :
the condition has length > 1 and only the first element will be used
3: In if (charToRaw(x) < 20) paste("\\u", toupper(format(as.hexmode(as.integer(charToRaw(x))), :
the condition has length > 1 and only the first element will be used
library(tidyverse)
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.3 ✓ dplyr 1.0.7
✓ tidyr 1.1.3 ✓ stringr 1.4.0
✓ readr 2.0.0 ✓ forcats 0.5.1
── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x tidyr::expand() masks Matrix::expand()
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
x tidyr::pack() masks Matrix::pack()
x tidyr::unpack() masks Matrix::unpack()
library(ggplot2)
library(gridExtra)
Attaching package: ‘gridExtra’
The following object is masked from ‘package:dplyr’:
combine
library(d3heatmap)
======================
Welcome to d3heatmap version 0.9.0
Type citation('d3heatmap') for how to cite the package.
Type ?d3heatmap for the main documentation.
The github page is: https://github.com/talgalili/d3heatmap/
Please submit your suggestions and bug-reports at: https://github.com/talgalili/d3heatmap/issues
You may ask questions at stackoverflow, use the r and d3heatmap tags:
https://stackoverflow.com/questions/tagged/d3heatmap
======================
Attaching package: ‘d3heatmap’
The following object is masked from ‘package:Matrix’:
print
The following objects are masked from ‘package:base’:
print, save
library(igraph)
Attaching package: ‘igraph’
The following objects are masked from ‘package:dplyr’:
as_data_frame, groups, union
The following objects are masked from ‘package:purrr’:
compose, simplify
The following object is masked from ‘package:tidyr’:
crossing
The following object is masked from ‘package:tibble’:
as_data_frame
The following objects are masked from ‘package:stats’:
decompose, spectrum
The following object is masked from ‘package:base’:
union
library(zoo)
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
library(forecast)
Apple mobile data was provided in this WWW page: https://www.apple.com/covid19/mobility , [APPL1]. (The data has to be download from that web page – there is an “agreement to terms”, etc.)
dfAppleMobility <- read.csv( "~/Downloads/applemobilitytrends-2021-07-23.csv", stringsAsFactors = FALSE)
#dfAppleMobility <- read.csv( "~/Downloads/applemobilitytrends-2021-02-20.csv", stringsAsFactors = FALSE)
#dfAppleMobility <- read.csv("https://covid19-static.cdn-apple.com/covid19-mobility-data/2024HotfixDev18/v3/en-us/applemobilitytrends-2021-01-15.csv")
names(dfAppleMobility) <- gsub( "^X", "", names(dfAppleMobility))
names(dfAppleMobility) <- gsub( ".", "-", names(dfAppleMobility), fixed = TRUE)
dfAppleMobility
Observation: The directions requests volumes reference date for normalization is 2020-01-13 : all the values in that column are \(100\).
Data dimensions:
dim(dfAppleMobility)
[1] 4691 564
Data summary:
summary(as.data.frame(unclass(dfAppleMobility[,1:3]), stringsAsFactors = TRUE))
geo_type region transportation_type
city : 790 Washington County: 27 driving:3048
country/region: 153 Jefferson County : 25 transit: 551
county :2638 Montgomery County: 24 walking:1092
sub-region :1110 Franklin County : 22
Madison County : 21
Jackson County : 19
(Other) :4553
Number of unique “country/region” values:
dfAppleMobility %>%
dplyr::filter( geo_type == "country/region") %>%
dplyr::pull("region") %>%
unique %>%
length
[1] 63
Number of unique “city” values:
dfAppleMobility %>%
dplyr::filter( geo_type == "city") %>%
dplyr::pull("region") %>%
unique %>%
length
[1] 295
All unique geo types:
lsGeoTypes <- unique(dfAppleMobility[["geo_type"]])
lsGeoTypes
[1] "country/region" "city" "sub-region" "county"
All unique transportation types:
lsTransportationTypes <- unique(dfAppleMobility[["transportation_type"]])
lsTransportationTypes
[1] "driving" "walking" "transit"
It is better to have the data in long form (narrow form). For that I am using the package “tidyr”.
# lsIDColumnNames <- c("geo_type", "region", "transportation_type") # For the initial dataset released by Apple.
lsIDColumnNames <- c("geo_type", "region", "transportation_type", "alternative_name", "sub-region", "country" )
dfAppleMobilityLongForm <- tidyr::pivot_longer( data = dfAppleMobility, cols = setdiff( names(dfAppleMobility), lsIDColumnNames), names_to = "Date", values_to = "Value" )
dim(dfAppleMobilityLongForm)
[1] 2617578 8
Remove the rows with “empty” values:
dfAppleMobilityLongForm <- dfAppleMobilityLongForm[ complete.cases(dfAppleMobilityLongForm), ]
dim(dfAppleMobilityLongForm)
[1] 2583992 8
Add the “DateObject” column:
dfAppleMobilityLongForm$DateObject <- as.POSIXct( dfAppleMobilityLongForm$Date, format = "%Y-%m-%d", origin = "1970-01-01" )
Add “day name” (“day of the week”) field:
dfAppleMobilityLongForm$DayName <- weekdays(dfAppleMobilityLongForm$DateObject)
Here is sample of the transformed data:
set.seed(3232)
dfAppleMobilityLongForm %>% dplyr::sample_n( 10 )
Here is summary:
summary(as.data.frame(unclass(dfAppleMobilityLongForm), stringsAsFactors = TRUE))
geo_type region transportation_type alternative_name sub.region country Date Value DateObject
city : 438458 Washington County: 14995 driving:1669995 :2018380 : 735301 United States:1722894 2020-01-13: 4652 Min. : 0.44 Min. :2020-01-13 00:00:00
country/region: 84915 Jefferson County : 13887 transit: 306463 AB : 1669 Texas : 133831 Japan : 122843 2020-01-14: 4652 1st Qu.: 89.28 1st Qu.:2020-06-01 00:00:00
county :1465186 Montgomery County: 13338 walking: 607534 ACT : 1669 California: 92238 : 84915 2020-01-15: 4652 Median : 119.95 Median :2020-10-18 00:00:00
sub-region : 595433 Franklin County : 12214 Andalucía : 1669 Georgia : 72747 France : 50018 2020-01-16: 4652 Mean : 128.60 Mean :2020-10-17 16:38:35
Madison County : 11661 Bayern : 1669 Virginia : 68307 Germany : 47776 2020-01-17: 4652 3rd Qu.: 156.92 3rd Qu.:2021-03-06 00:00:00
Jackson County : 10551 BC|Colombie-Britannique: 1669 Florida : 67241 Thailand : 37752 2020-01-18: 4652 Max. :2148.12 Max. :2021-07-23 00:00:00
(Other) :2507346 (Other) : 557267 (Other) :1414327 (Other) : 517794 (Other) :2556080
DayName
Friday :367508
Monday :368574
Saturday :367508
Sunday :367508
Thursday :372160
Tuesday :368574
Wednesday:372160
Partition the data into geo types × transportation types:
dfAppleMobilityLongForm %>%
dplyr::group_by( geo_type, transportation_type) %>%
dplyr::count()
aQueries <- split(dfAppleMobilityLongForm, dfAppleMobilityLongForm[,c("geo_type", "transportation_type")] )
We can visualize the data using heat-map plots.
Remark: Using the contingency matrices prepared for the heat-map plots we can do further analysis, like, finding correlations or nearest neighbors. (See below.)
Cross-tabulate dates with regions:
aMatDateRegion <- purrr::map( aQueries, function(dfX) { xtabs( formula = Value ~ Date + region, data = dfX, sparse = TRUE ) } )
aMatDateRegion <- aMatDateRegion[ purrr::map_lgl(aMatDateRegion, function(x) nrow(x) > 0 ) ]
dfPlotQuery <- purrr::map_df( aMatDateRegion, Matrix::summary, .id = "Type" )
head(dfPlotQuery)
555 x 295 sparse Matrix of class "dgCMatrix", with 163725 entries
Type i j x
1 city.driving 1 1 100.00
2 city.driving 2 1 100.73
3 city.driving 3 1 102.86
4 city.driving 4 1 102.65
5 city.driving 5 1 109.39
6 city.driving 6 1 109.62
ggplot2::ggplot(dfPlotQuery) +
ggplot2::geom_tile( ggplot2::aes( x = j, y = i, fill = log10(x)), color = "white") +
ggplot2::scale_fill_gradient(low = "white", high = "blue") +
ggplot2::xlab("Region") + ggplot2::ylab("Date") +
ggplot2::facet_wrap( ~Type, scales = "free", ncol = 2)
Here we take a “closer look” to one of the plots using a dedicated d3heatmap plot:
d3heatmap::d3heatmap( x = aMatDateRegion[["country/region.driving"]], Rowv = FALSE )
Warning in RColorBrewer::brewer.pal(n, pal) :
n too large, allowed maximum for palette RdYlBu is 11
Returning the palette you asked for with that many colors
Warning in RColorBrewer::brewer.pal(n, pal) :
n too large, allowed maximum for palette RdYlBu is 11
Returning the palette you asked for with that many colors
Here we create nearest neighbor graphs of the contingency matrices computed above and plot cluster the nodes:
th <- 0.94
aNNGraphs <-
purrr::map( aMatDateRegion, function(m) {
m2 <- cor(as.matrix(m))
for( i in 1:nrow(m2) ) {
m2[i,i] <- 0
}
m2 <- as( m2, "dgCMatrix")
m2@x[ m2@x <= th ] <- 0
#m2@x[ m2@x > th ] <- 1
igraph::graph_from_adjacency_matrix(Matrix::drop0(m2), weighted = TRUE, mode = "undirected")
})
ind <- 3
ceb <- cluster_edge_betweenness(aNNGraphs[[ind]])
dendPlot(ceb, mode="hclust", main = names(aNNGraphs)[[ind]])
plot(ceb, aNNGraphs[[ind]], vertex.size=1, vertex.label=NA, main = names(aNNGraphs)[[ind]])
In this section for each date we sum all cases over the region-transportation pairs, make a time series, and plot them.
Remark: In the plots the Sundays are indicated with orange dashed lines.
Here we make the time series:
aDateStringToDateObject <- unique( dfAppleMobilityLongForm[, c("Date", "DateObject")] )
aDateStringToDateObject <- setNames( aDateStringToDateObject$DateObject, aDateStringToDateObject$Date )
aDateStringToDateObject <- as.POSIXct(aDateStringToDateObject)
aTSDirReqByCountry <- purrr::map( aMatDateRegion, function(m) rowSums(m) )
matTS <- do.call( cbind, aTSDirReqByCountry)
Warning in (function (..., deparse.level = 1) :
number of rows of result is not a multiple of vector length (arg 1)
zooObj <- zoo::zoo( x = matTS, as.POSIXct(rownames(matTS)) )
Here we plot them:
autoplot(zooObj) +
aes(colour = NULL, linetype = NULL) +
facet_grid(Series ~ ., scales = "free_y") +
geom_vline( xintercept = aDateStringToDateObject[weekdays(aDateStringToDateObject) == "Sunday"], color = "orange", linetype = "dashed", size = 0.3 )
Observation: In the time series plots the Sundays are indicated with orange dashed lines. We can see that from Monday to Thursday people are more familiar with their trips than say on Fridays and Saturdays. We can also see that on Sundays people (on average) are more familiar with their trips or simply travel less.
He we do “forecast” for code-workflow demonstration purposes – the forecasts should not be taken seriously.
Fit a time series model to the time series:
aTSModels <- purrr::map( names(zooObj), function(x) { forecast::auto.arima( zoo( x = zooObj[,x], order.by = index(zooObj) ) ) } )
aTSModels <- purrr::map( names(zooObj), function(x) forecast::forecast( as.matrix(zooObj)[,x] ) )
names(aTSModels) <- names(zooObj)
Plot data and forecast:
lsPlots <- purrr::map( names(aTSModels), function(x) autoplot(aTSModels[[x]]) + ylab("Volume") + ggtitle(x) )
names(lsPlots) <- names(aTSModels)
do.call( gridExtra::grid.arrange, lsPlots )
[APPL1] Apple Inc., Mobility Trends Reports, (2020), apple.com.
[AA1] Anton Antonov, “Apple mobility trends data visualization”, (2020), SystemModeling at GitHub.
[AA2] Anton Antonov, “NY Times COVID-19 data visualization”, (2020), SystemModeling at GitHub.